Integrating morpho-syntactic features in English-Arabic statistical machine translation
نویسندگان
چکیده
This paper presents a hybrid approach to the enhancement of English to Arabic statistical machine translation quality. Machine Translation has been defined as the process that utilizes computer software to translate text from one natural language to another. Arabic, as a morphologically rich language, is a highly flexional language, in that the same root can lead to various forms according to its context. Statistical machine translation (SMT) engines often show poor syntax processing especially when the language used is morphologically rich such as Arabic. In this paper, to overcome these shortcomings, we describe our hybrid approach which integrates knowledge of the Arabic language into statistical machine translation. In this framework, we propose the use of a featured language model SFLM (Smaïli et al., 2004) to be able to integrate syntactic and grammatical knowledge about each word. In this paper, we first discuss some challenges in translating from English to Arabic and we explore various techniques to improve performance on this task. We apply a morphological segmentation step for Arabic words and we present our hybrid approach by identifying morpho-syntactic class of each segmented word to build up our statistical feature language model. We propose the scheme for recombining the segmented Arabic word, and describe their effect on translation.
منابع مشابه
The MIRACL Arabic-English Statistical Machine Translation
This paper describes the MIRACL statistical Machine Translation system and the improvements that were developed during the IWSLT 2010 evaluation campaign. We participated to the Arabic to English BTEC tasks using a phrase-based statistical machine translation approach. In this paper, we first discuss some challenges in translating from Arabic to English and we explore various techniques to impr...
متن کاملReduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language
We address the problem of statistical machine translation from highly inflective language to less inflective one. The characteristics of inflective languages are generally not taken into account by the statistical machine translation system. Existing translation systems often treat different inflected word forms of the same lemma as if they were independent of each other, although some interdep...
متن کاملMorpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation
The Arabic language has far richer systems of inflection and derivation than English which has very little morphology. This morphology difference causes a large gap between the vocabulary sizes in any given parallel training corpus. Segmentation of inflected Arabic words is a way to smooth its highly morphological nature. In this paper, we describe some statistically and linguistically motivate...
متن کاملMorphology In Statistical Machine Translation From English To Highly Inflectional Language
In this paper, we investigate the role of morphology in phrase-based statistical machine translation (SMT) from English to the highly inflectional Slovenian language. Translation to an inflectional language is a challenging task because of its morphological complexity. Rich morphology increases data sparsity and worsens the quality of statistical machine translation. The idea of the paper is to...
متن کاملApplying Morphology to English-Arabic Statistical Machine Translation
We introduce two approaches to augmenting English-Arabic statistical machine translation (SMT) with linguistic knowledge. The first approach improves SMT by adding linguistically motivated syntactic features to particular phrases. These added features are based on the English syntactic information, namely part-of-speech tags and dependency parse trees. We achieved improvements of 0.2 and 0.6 in...
متن کامل